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Recently, ICASE has begun differentiating between reports with a mathemat- 
ical or applied science theme and reports whose main emphasis is some aspect of 
computer science by producing the computer science reports with a yellow cover. 
The blue cover reports will now emphasize mathematical research. In all other 
aspects the reports will remain the same; in particular, they will continue to be 
submitted to the appropriate journals or conferences for formal publication. 
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The Use of Lanczos’s Method to Solve the 
Large Generalized Symmetric Definite 
Eigenvalue Problem 

Mark T. Jones*and Merrell L. Patrick*^ 


Abstract 

The generalized eigenvalue problem, Kx = A Afar, is of signifi- 
cant practical importance, especially in structural engineering where 
it arises as the vibration and buckling problems. A new algorithm, 
LANZ, based on Lanczos’s method is developed. LANZ uses a tech- 
nique called dynamic shifting to improve the efficiency and reliability 
of the Lanczos algorithm. A new algorithm for solving the tridiagonal 
matrices that arise when using Lanczos’s method is described. A mod- 
ification of Parlett and Scott’s selective orthogonalization algorithm 
is proposed. Results from an implementation of LANZ on a Convex 
C-220 show it to be superior to a subspace iteration code. 
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1 Introduction 

The solution of the symmetric generalized eigenvalue problem, 

Kx = A Mx, (1) 

where K and M are real, symmetric matrices, and either K or M is positive 
semi-definite, is of significant practical importance, especially in structural 
engineering as the vibration problem and the buckling problem [BH87]. The 
matrices K and M are either banded or sparse. Usually p « n of the small- 
est eigenvalues of Equation 1 are sought, where n is the order of the system. 
The method of Lanczos, suitably altered for the generalized eigenvalue prob- 
lem, is shown to be useful for the efficient solution of Equation 1. [Lan50] 
[NOPEJ87]. 

A sophisticated algorithm, based on the simple Lanczos algorithm, is 
developed in this paper. The algorithm, called LANZ, has been imple- 
mented on supercomputer architectures at NASA Langley Research Center 
and results from the implementation are discussed. Two applications from 
structural engineering are described in Section 2. The properties of the gener- 
alized eigenvalue problem and solution methods are given in Section 3. The 
simple Lanczos algorithm is presented in Section 4. The use of Lanczos’s 
method for the generalized eigenvalue problem and the LANZ algorithm 
are discussed in Section 5. An execution time cost analysis of LANZ is 
developed in Section 6. The solution of the tridiagonal matrices that arise 
when using Lanczos’s method is considered in Section 7. Methods for solving 
the symmetric indefinite linear systems that arise in LANZ are described in 
Section 8. In Section 9, the performance of the LANZ algorithm is analyzed 
and compared to the performance of subspace iteration, the most prevalent 
method for solving this class of problems in structural engineering. 
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2 Problems of Interest 

Two important applications of LANZ which arise in structural engineering 
are the vibration problem and the buckling problem. Practical problems 
from these applications will be used to test the performance of the program. 

In the free vibration problem, the vibration frequencies, w, and mode 
shape vectors, x, of a structure are sought. The equation 

Kx = u ?Mx, (2) 

is solved, where K is the positive definite stiffness matrix and M is the semi- 
positive definite mass matrix. The mass matrix, M , can be either diagonal 
(in which case it is referred to as a diagonal mass matrix) or can have approx- 
imately the same structure as K (in which case it is referred to as a consistent 
mass matrix). Because K is positive definite and M is semi-positive definite, 
the eigenvalues of 2 are non-negative. When a dynamic load is applied to a 
structure, the structure begins to vibrate. If the vibration frequency is close 
to a natural frequency of the structure, w, then the resulting resonance can 
lead to structural failure. Engineers want to ensure that a structure has no 
natural frequencies near the range of frequencies of expected dynamic loads. 
Engineers are often interested in a few of the lowest frequencies of the struc- 
ture to ensure that these natural frequencies are well above the frequencies 
of expected dynamic loads. [Jon89]. In some situations, all the frequencies 
in a particular range may be sought. [Kni89]. 

In the buckling problem, the smallest load at which a structure will buckle 
is sought. These buckling loads of a structure, A, are the eigenvalues of the 
system, 

Kx = -XK G x, (3) 

where K is the positive definite stiffness matrix, Kg is the geometric stiffness 
matrix, and x is the buckling mode shape vector [Mah87]. The I<g matrix 
has approximately the same structure as K and can be indefinite and singu- 
lar. Because K G is indefinite, the eigenvalues of Equation 3 can be negative, 
although in most practical problems they are positive. Equation 3 can have 
up to n distinct eigenvalues; although only the smallest eigenvalue is of phys- 
ical importance since any load greater than the smallest buckling load will 
cause buckling [Jon89]. Designers may, however, be interested in the six to 
ten lowest eigenvalues in order to observe the spacing of the buckling loads. 
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If the lowest buckling loads are greatly separated from other buckling loads, 
the designer may seek to change the structure in order to cause the lowest 
buckling loads to become closer to the other buckling loads of the structure. 
[Kni89]. Because the Kq matrix can be indefinite and singular, the buckling 
problem is more difficult to solve numerically than the vibration problem. 


4 



3 The Generalized Eigenvalue Problem 
3.1 Properties 

Equation 1 yields n eigenvalues if and only if the rank of M is equal to n, 
which may not be the case in the problems under consideration [GL83]. In 
fact, if M is rank deficient, then the number of eigenvalues of Equation 1 
may be finite, empty, or infinite [GL83]. In practical problems, however, the 
set of eigenvalues is finite and non-empty. When K and M are symmetric 
and one or both of them is positive definite, the eigenvalues of Equation 1 
are real (without sacrificing generality, it will be assumed that M is positive 


definite). An orthogonal matrix, R , exists such that 

R t MR = diag(p ?) = D\ (4) 

where the ft? are the eigenvalues of M and are therefore positive real. If M 
is expanded to RD^R? , then Equation 1 becomes 

(. I< - A M)x = (I< - A RDDR t )x. (5) 

Equation 5 can be rearranged to become 

(. K - A M)x = (RD(D~ 1 R t KRD- 1 - A I)DR T )x. (6) 

Taking the determinant of each side yields 

det(I< - AM) = (det(R)) 2 (det{D)) 2 det(P - XI), (7) 


where P = D -1 R T K RD ~ l . Because R is orthogonal det(R) = 1. Det(D) is 
the product of the eigenvalues of M , which are all positive real. Therefore, 
the roots of det(K — AM) are those of det(P — XI). Because P is a symmetric 
matrix, its eigenvalues are all real and therefore those of Equation 1 are all 
real [Wil65]. 

When K is positive definite and M is positive semi-definite, as they are in 
the vibration problem, the eigenvalues of Equation 1 are all non-negative. To 
show that the eigenvalues are non-negative, multiply each side of Equation 1 
by x T to get 

x t Kx — X x t Mx. (8) 
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Because K is positive definite, x T Kx is positive, and because M is posi- 
tive semi-definite, x T Mx is non-negative; therefore, A must be non-negative 
[Wil65]. 

The eigenvectors of Equation 1 satisfy M -orthogonality ( x T My = 0, if x 
and y are ^/-orthogonal). The matrix, P, has a complete set of eigenvectors, 


Z{, and the same eigenvalues as Equation 1. Thus, 

Pz{ — A (9) 

and, 

D~ l R T I< RD~ l Zi = A ,z,. (10) 

Then, the sequence of transformations in Equation 11 show that x = RD~*z, 

KRD-'zi = XiRDzi = \ i RD(DR T RD~ 1 )z i = A,M(f?D _1 zj) (11) 

Because the z ,• are known to be orthogonal, 

0 = zfzj = (DR t Xi) T DR T Xj = XiMxj. (12) 


The methods and algorithms discussed in this paper rely on the eigenvalues 
being real and the eigenvectors being Af -orthogonal. In addition, LANZ 
makes use of the property that the eigenvalues in the vibration problem are 
non-negative. 

3.2 Solution Methods 

Several methods for solving the large symmetric generalized eigenvalue prob- 
lem exist. A popular method, especially in structural engineering, is sub- 
space iteration [Bat82]. Nour-Omid, Parlett and Taylor provide convincing 
theoretical and experimental evidence that Lanczos’s method is superior to 
subspace iteration on a sequential machine [NOPT83]. The parallelization of 
subspace iteration was investigated by Mahajan and it remains an open ques- 
tion as to whether Lanczos’s method will be superior to subspace iteration 
on parallel machines [Mah87]. Another solution method for the generalized 
eigenvalue problem is a two-level iterative method proposed by Szyld [Szy83]. 
He uses a combination of the inverse iteration and Rayleigh quotient meth- 
ods at the outer level, and an iterative solver for indefinite systems at the 
inner level. Szyld assumes, however, that M is non-singular, which is not 
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always the case for the applications under examination. A promising method 
for parallel machines based on the Sturm sequence property is proposed by 
Ma [Ma88]. He uses multi-sectioning or bi-sectioning to obtain the eigen- 
values, and then uses inverse iteration to recover the eigenvectors. Again, 
the assumption is made that M is non-singular. Schwarz proposes a method 
which minimizes the Rayleigh quotient by means of the conjugate gradient 
method. The method uses a partial Choleski decomposition as a precon- 
ditioner to speed convergence [Sch89]. SOR-based methods have also been 
proposed for the generalized eigenvalue problem, but these suffer from two 
flaws: 1) the methods have difficulty when the eigenvalues are clustered, and 
2) the methods require that M be positive definite [Ruh74]. Other meth- 
ods have also been proposed [SW82] [Sch74]. Block Lanczos methods have 
been developed but are more complicated than the simple Lanczos process. 
Block methods are limited in that the user of the method must choose the 
size of the blocks [NOC85]. One significant advantage of the block methods 
is that they can easily reveal the presence of multiple eigenvalues [GU77]. 
To the best of the authors’ knowledge, no satisfactory method of choosing 
this block size to best determine the multiplicity of eigenvalues has been 
proposed. The block size is usually chosen to take advantage of a particular 
computing architecture [Ruh89]. 


7 



4 Lanczos’s Method 

4.1 Execution in Exact Arithmetic 

In order to understand Lanczos’s method when applied to the generalized 
eigenvalue problem, it is first necessary to examine the method when applied 
to 

Ax = Ax, (13) 

where A is an n x n real symmetric matrix. In Lanczos’s method, the matrix 
A is transformed into a tridiagonal matrix, T, in n steps in exact arithmetic. 
However, roundoff errors make the simple use of Lanczos’s method as a di- 
rect method for tridiagonalizing A impractical [Sim84]. Paige suggests using 
Lanczos’s method as an iterative method for finding the extremal eigenvalues 
of a matrix [Pai7l]. At each step, j, of the Lanczos algorithm, a tridiagonal 
matrix, Tj, is generated. The extremal eigenvalues of Tj approximate those 
of A, and as j grows, the approximations become increasingly good [GL83]. 
Both Kaniel and Saad have examined the convergence properties of Lanczos 
in exact arithmetic [Kan66] [Saa80]. The convergence results that they de- 
rive imply that the speed of convergence of the eigenvalues of the T matrices 
to eigenvalues of A depends on the distribution of the eigenvalues of A; if an 
extreme eigenvalue of A is well separated from its neighbors, convergence to 
this eigenvalue will be fast. However, clustered eigenvalues, or eigenvalues 
that are in the middle of the spectrum of A, will not be converged to as 
quickly as the extremal eigenvalues. 

In Lanczos’s method, a series of orthonormal vectors, qi — <J n , is gener- 
ated which satisfy: 

T = Q t AQ, (14) 

where the vectors qi are the columns of Q. If each side of Equation 14 is 
multiplied by Q to yield, 

QT = AQ , (15) 

and the columns on each side are set equal, then 

#j+i9j+i + qjOtj + q 2 -\P] = Aqj. (16) 

where the a’s and /?’s are the diagonal and subdiagonal, respectively, of T, 
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the tridiagonal matrix in Equation 17 [GL83]. 


f P 2 \ 

Pi Q-i pz 

Pz <*3 P 4 

i Pj - 1 Q'j-i Pj 

Pi a i / 


( 17 ) 


Equation 16 can be rearranged to become the three- term recurrence relation 


Pi+iQj+i — Aqj — qjOtj — q : -iP } . 


(18) 


From this recurrence relation at step j: 

AQ j = Q j T j + r j e J, (19) 

where rj = qj+i/Pj+i and ej is the jth column of the j x j identity ma- 
trix [GL83]. These q vectors, also called Lanczos vectors, are the key to the 
algorithm. They are generated as shown in Equation 18. That this recur- 
rence relation generates a set of Lanczos vectors, q\ . . . qj, which belong to the 
Krylov subspace, k(A, qi,j), can be shown by construction. That these Lanc- 
zos vectors are also orthonormal, and therefore span the Krylov subspace, 
fc(A,qi,j), is shown by induction in [GL83]. If the Lanczos vectors are not 
orthonormal, then this three-term recurrence relation does not generate the 
correct T matrix. 

Two of the major benefits of the simple Lanczos algorithm are: 1) the 
structure of the matrix A is not important for the Lanczos algorithm; the only 
access to A that the algorithm requires is a routine that returns the product 
of A times a vector, and 2) at step j , only two of the Lanczos vectors, qj~\ 
and qj, are required to be stored in core memory, the rest can be stored in 
secondary memory until needed for the computation of the eigenvectors. The 
simple Lanczos algorithm is shown in Figure 1 [GL83]. 

The tridiagonal eigenvalue problem generated by Lanczos’s method can 
be written in the form 

TjS = 6s, (20) 

where the 0’s are sometimes called Ritz values. The eigenvalues of Tj will 
approximate those of A, especially the extremal ones. Approximations to the 


9 



ro = starting vector 

Pi = II r o II 
qo = o 
j = l 

while (0j ^ 0) 

= r j-i/Pi 
<*> = qj Mi 

r j = Mi - <*jqj - PjQj-i 
Pi + 1 = II r i II 
j = j + 1 


Figure 1 : The simple Lanczos algorithm 

eigenvectors of A, called Ritz vectors, can be calculated using the equation 

y. = QjSi, (21) 

where y, is the Ritz vector corresponding to s, [PN085]. These Ritz vectors 
satisfy: [NOPEJ87] 

Axu = y,0i + rjSi(j). (22) 


4.2 Effect of Roundoff Errors 

The algorithm and equations in Subsection 4.1 hold only in exact arithmetic; 
they degenerate in the presence of roundoff errors, and, therefore, the Lanczos 
vectors can no longer be assumed to be orthonormal. When roundoff errors 
are taken into account, Equation 19 becomes 

AQj = QjTj + r j e- -f- Fj , (23) 

where Fj is used to represent roundoff error. Then, 

II Mi ~ ytfi ||< Pji+ II Fj ||, (24) 

can be used to bound the error in the Ritz pair (0,-,y,). The norm of F can 
be bounded by n^ 2 c || A ||, where e is a constant based on the floating point 
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precision of the computer, and ftji = /3j + 1 | | [PN085]. The bound on 

|| F(j) || is small and can be disregarded. The most important factor then 
becomes . From [Pai7l] 

I yfft+i 1= Ti/ftn ( 25 ) 

where 7 is a roundoff term approximately equal to e || A ||. From Equation 25, 
the conclusion can be drawn that as /?j t - becomes small (and hence the error 
in the Ritz pair, (0„y<), also becomes small), the q vectors lose orthogonality 
[Par80]. Equation 25 implies that convergence of a Ritz pair to an eigenpair 
of A results in a lack of orthogonality among the Lanczos vectors. More 
specifically, significant components of y,-, the converged Ritz vector, creep 
into subsequent q/s, causing spurious copies of Ritz pairs to be generated by 
the Lanczos process. 

Several remedies for the loss of orthogonality have been proposed. Paige 
suggests full reorthogonali zation , in which the current Lanczos vector, qj, is 
orthogonalized against all previous Lanczos vectors [Pai71]. Full reorthogo- 
nalization becomes increasingly expensive as j grows. Cullum and Willoughby 
advocate a method in which the lack of orthogonality is ignored and sophis- 
ticated heuristics are used to detect the eigenvalues that are being sought 
among the many spurious eigenvalues that are generated [CW85]. Simon 
proposes a scheme called partial reorthogonalization in which estimates for 
the orthogonality of the current Lanczos vector, qj, against all previous Lanc- 
zos vectors are inexpensively computed. Based on the estimates, a small 
set of the previous Lanczos vectors are orthogonalized against qj [Sim84]. 
Partial reorthogonalization maintains semi-orthogonality among the Lanc- 
zos vectors. For all q vectors, semi-orthogonality is defined by: 

qjq, < e 1/2 * ^ j. (26) 

“Semi-orthogonality” among the q vectors guarantees that the T matrix gen- 
erated by the Lanczos algorithm executed with roundoff errors will be the 
same up to working precision as the T matrix generated using exact arith- 
metic [PN085]. Selective orthogonalization is used by LANZ in an adapted 
form to maintain semi-orthogonality among the Lanczos vectors [PS79]. The 
strategy of selective reorthogonalization, as proposed by Parlett and Scott, 
orthogonalizes the current residual vector, r^, and the last Lanczos vector, 
qj _ 1, at the beginning of each step in the Lanczos algorithm against “good” 
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Ritz vectors (more details of how this occurs will be given in Section 5). 

Good” Ritz vectors are those which correspond to Ritz values for which 
the value of is below e 1/2 || A ||. A low /? Jt value suggests that the Ritz 
value is converging and, therefore, from Equation 25, | yTq J+1 | is increasing. 
The value of e 1 ' 2 || A || is used to ensure that the quantity | yj q j+l | never 
rises above c 1 / 2 . As a result, semi-orthogonality, as defined in Equation 26, 
is maintained. 


12 



5 The LANZ Algorithm 

5.1 The Spectral Transformation 

In order to use Lanczos’s method to find the lowest eigenvalues or the eigen- 
values closest to some value, a, of Kx = AMx, a transformation of the 
problem must be made. The Lanczos algorithm described in Section 4 is 
applicable only to Ax = Ax. Two transformations are available when K and 
M axe symmetric and M is positive semi-definite. Each transformation in 
this section will be represented by a capital letter that has no other mean- 
ing. Transformation A, proposed by Ericsson and Ruhe, replaces A with 
W T (K - oMY'W to yield, 

W T (I< - oM)- 1 Wy = vy, (27) 

where M = WW T , y = W T x , and A = a + l/i/ [ER80]. Transformation B, 
suggested by Nour-Omid, Parlett, Ericsson and Jensen, replaces A with (K — 
<tM)~*M and uses the M-inner product, because the operator is no longer 
symmetric [NOPEJ87]. Note that M does not form a true inner product 
because M is not positive definite. This semi-inner product is acceptable, 
however, because the only situation in this algorithm in which x T Mx = 0, 
for a non-trivial x, is when /3 J+ i = 0, which indicates exact convergence in 
the Lanczos algorithm. (Hereafter, the semi-inner product will be referred 
to as just an inner product). Equation 1 becomes 

(I< - oM)-'Mx = vx, (28) 

where the eigenvalues of the original system can be recovered via 

A = cr + 1/v. (29) 

Transformation B is a shifted inverted version of Equation 1 by virtue of the 
following steps. Substituting Equation 29 in Equation 1 yields 

Kx — crMx = \jvMx. (30) 

Then, solving for x and multiplying by v gives 

vx = (I< - aM)~ l Mx. (31) 
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1) Choose an initial vector, guess 

2) ro = (K — aM)~ l Mguess (Purge n(B) from guess ) 

3) pi = Mr 0 

4 ) A = ( r oPi) 1/2 

5) For j = 1, maximum number of iterations 

6) Reorthogonalization phase 

7 ) = r#-i/A 

8 ) pj = Pj/Pj 

9) {K — crM)rj = pj (symmetric indefinite solve) 

10) r, = rj - qj-iPj 

11 ) Q j = r jPi 

12 ) n = rj - q jQj 

13 ) Pi + 1 = Mr, 

14) P i+ , = (rJp i+1 )V2 

15) Compute the eigenvalues of Tj and the corresponding error bounds 

16) Compute any converged Ritz vector 

17) Halt if enough eigenvalues have been found 

18) End of Loop 


Figure 2: The Lanczos Algorithm Using Transformation B 

Transformation B is superior to transformation A in both storage and 
operation count requirements [NOPEJ87]. Transformation B requires some 
modifications to the original algorithm, including the solution of the system 
(K - aM)x = y for x , and the use of the M-inner product. The vector p has 
also been added to the original Lanczos algorithm to hold the intermediate 
quantity Mrj. In the initialization step, the initial guess vector is multiplied 
by B to ensure that r 0 is contained in r(B), where B = (K - The 

efficient implementation of these operations is described in later sections. The 
modified algorithm is shown in Figure 2 [NOPEJ87]. Reorthogonalization in 
step 6 is much the same as that described in Section 4, with the exception 
that the reorthogonalization is done in the M-inner product [NOPEJ871. 

If the matrix M is singular, then n(S) might not be null. The eigen- 
vectors of Equation 1 have no components in n(Af) (also, n(M) = n(B)) 
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[NOPEJ87]. In exact arithmetic, if the starting vector, q 0 , of the Lanczos 
process is restricted to r(B), then all subsequent Lanczos vectors will be re- 
stricted to r(B), because they are computed by multiplying by B. However, 
in finite precision arithmetic, roundoff error allows components of subsequent 
Lanczos vectors to be in n(H); therefore, the Ritz vectors calculated from 
them will have components in n(B) [NOPEJ87], Purifying these Lanczos 
vectors is an expensive process, but a method exists that instead will in- 
expensively purify the calculated Ritz vectors. The vector W{ is computed, 
where Wi is a j -f - 1-length vector whose first j components are calculated as, 


v>i = (1 /OiXTjSi), (32) 

and whose last component is 

(*U)ft+i)/»i- ( 33 > 

Equation 21 then becomes: 

yi = Qj+\Wi, ( 34 ) 


where to, has replaced s*. 

5.2 Transformations for Buckling 

Transformations A and B are not applicable to the buckling problem because 
K g can be indefinite. Transformation A fails because it requires the Choleski 
factorization of Kq • Transformation B fails because it requires the use of a 
Kc-inner product which would be an indefinite inner product and introduce 
complex values into the calculations. Transformation C, suggested for the 
buckling problem in [Ste89b], uses the operator (I< + aK G )~'I<. The trans- 
formation ( K — o Kg) is actually suggested in [Ste89b], but (/f 4* &Kg) I s 
preferred because it yields the correct inertia for the buckling problem (note 
that the inertia referred to here is not the inertia of a physical body, it is 
the definition of inertia used in Sylvester’s inertia theorem relating to the 
number of positive, negative, and zero eigenvalues of a matrix). The inertia 
of ( K + vKg ) reveals how many eigenvalues of Equation 3 are less than cr. 
Transformation C can be derived from Equation 3 by substituting <Tv/{y— 1) 
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for A in Equation 3, multiplying each side by ( v — 1), rearranging terms, and 
finally, multiplying each side by (I< + <tKg)~ x to yield 

vx = ( I< + oKgY'Kx. (35) 

To recover A, use Equation 36. 

A = avf{v — 1). (36) 

Transformation C requires that a be non- zero. The factorization of (K + 
oKq) and the use of the K-'umei product are necessary for transformation 
C. 

In exact arithmetic, the convergence rate of the Lanczos algorithm is the 
same for transformations B and C when a is fixed (B performs the same 
transformation on the spectrum as A). The Kaniel-Page-Saad theory is used 
to explain the convergence of eigenvalues when using the Lanczos algorithm 
and is now introduced to allow a comparison of the transformations and 
to show the effects of moving a on the convergence rate [Saa80]. Three 
definitions that will be useful in this explanation are: 

K ’i = ff WL - >w)/(*i - (37) 

m=l 

7, = 1 + 2 (Ui - */,+! )/(*/,+! - !/,„/), (38) 

and the Chebyshev polynomial, 

C n (x) = l/2((* + (* 2 - l) 1 / 2 )" + (x - (x 2 - l) 1 / 2 )”). (39) 

The bound on the difference between an eigenvalue of the Tj matrix, 0,, and 
an eigenvalue of the transformed system, i/,, at the jth step of the Lanczos 
algorithm is 

0 < Vi - Oj < - Vi„f)(K{tan L>(xi,r o)/^.,^,)) 2 , i/ t - > (40) 

The tan u;(x,,r 0 ) is determined by the angle between the eigenvector, x,-, 
associated with V{ and the starting Lanczos vector, i*o. Because the angle be- 
tween x, and r 0 does not change during the Lanczos algorithm and because 
K* does not vary greatly, the term that governs the rate of the convergence 
is Cj-i(~a). As j increases, C,_,( 7 ,) grows more quickly for large fa than for 
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Figure 3: Effects of transformation on eigenvalue separation 

small * (* is a term, | (* - « + i)/(* + i ~ *w) I, obtained from the definition 
of v) The fa reflect the separation of individual eigenvalues from their neigh 
bors relative to the remaining width of the spectrum. Transformations are 
used to increase fa for the desired eigenvalues by transforming the spectrum 
such that the desired eigenvalues axe well- separated from other eigenvalues. 
It can be shown that, if used with the same a, transformations B and C have 
the same effect on the fa. However, moving a closer to a desired eigenv ue, 
A„ increases the corresponding fa (and therefore speeds convergence of ^ to 
Ai). The increase in fa as a is moved closer to Ai is shown in Figure 3. us, 
as a is moved closer to Ai the convergence of 0i to Ai is speeded up. 

Transformations B and C have the same effect on the convergence rates 
of the Lanczos process and C can be used in both the buckling and vibration 
problems, so the question arises “Why not use transformation C for both the 
buckling and vibration problems?” Although B and C have the same effect 
in exact arithmetic, they each yield different *’s for the same a. In finite 
precision arithmetic, transformation C is inferior to transformation B when 
a is small relative to the desired A’s. Although each transformation requires 
the solution of a linear system and the multiplication of a matrix by a vector, 
the distribution of the i/’s for small a in transformation C leads to large 
errors in the computation of the A’s. For small a, the «/’s of transformation 
C become close to 1 while the same effect is not seen when transformation 
B is used (note that the fa's in each case are identical). The v s that result 
when using transformation C become increasingly close to 1 as a is moved 
from 1.0 to 0.01, whereas the i/’s that result when using transformation B 
show little change (this trend is shown in Figure 4). Because the Lanczos 
algorithm consists of the same calculations for each transformation, in finite 
precision arithmetic the algorithm computes perturbed values assumed to be 
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Figure 4: Effects of transformation on eigenvalues 

of the form, (1 + e)i/, instead of an exact v. The effect of this perturbation on 
the computed A’s is the difference between the two transformations. Recall 
that in transformation B, A = a + 1 fv B , and that in transformation C, 
A = <r + <r/(u c - 1) (the subscript on v is introduced because the i/’s are 
different in each transformation and these values are being compared). If u c 
is solved for in terms of v B , then 

vc = ov B + 1. (41) 

Let 6 X B and 8X C denote the difference between the true A and the A computed 
using transformations B and C, respectively. These <5A’s are expressed in 
terms of perturbed i/’s in the following equations: 

A + SX B = o + 1/(1 + e)i/ B , 

X + SXc = a + <r/((l + t)u c — 1). 

If Equation 41 is substituted into Equation 43 , then 

A + SXc = <7 + oj {crv B + tav B + e) 

If the true A is subtracted from each side of Equations 42 and 44 , then 

SX B = \f{v B + tv B ) - \fv B . ( 45 ) 

= l/(i/ B + tv B + tja) - \fv B . (46) 

Thus, the error in the computed A for transformation C increases sharply 
as <j decreases. To show the increase in error for transformation C, the 


(42) 

(43) 

(44) 
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errors in the two transformations (from Equations 45 and 46) are plotted in 
Figure 5 for A = 10 and t = 0.0001. From this derivation and the graph of 
the functions, it is clear that transformation C should be avoided when a is 
small compared to the desired A. 

From the previous discussion the conclusion can be drawn that trans- 
formation B is preferred to C whenever possible. However, as was pointed 
out previously, transformation B is not applicable to the buckling problem. 
Therefore, a new transformation, D, which transforms the eigenvalues in the 
same fashion as transformation B (when & = 0) is introduced. Transforma- 
tion D can be used with an indefinite K G matrix but can only be used when 
cr is 0. Transformation D is derived from Equation 3 in the following steps 
[CW85]: first, substitute l/j/ for A and then multiply each side by K~ y v to 
yield 

vx = K~ x K g x ; (47) 

next, expand the implicit identity matrix in each side as / = C~ T C T , where 
K = CC T , let y = C T x, and, finally, multiply each side by C T to yield 

vy = C- l K G C- T y. (48) 

The operator for transformation D is C~ 1 K G C~ T . This transformation re- 
quires the Choleski factorization of K and uses the standard inner product. 
The eigenvectors, x , must be recovered via the solution of a triangular linear 
system, using the foregoing equation for computing y. When an initial non- 
zero guess for o exists, the method used in LANZ for solving the buckling 
problem uses transformation C exclusively; when an initial guess for a isn’t 
available, the method used begins by using transformation D with a at 0, 
and then switches to transformation C when a shift of a is needed (the use 
of shifting will be described in the next subsection). Thus, the use of trans- 
formation C with small a is avoided, and, yet, the advantage resulting from 
shifting is maintained. 

5.3 The Use of Shifts 

An efficient algorithm for computing several eigenvalues requires that the 
shift, be moved as close as possible to the eigenvalues that are being com- 
puted. The closer that a is to an eigenvalue being computed, the faster the 
convergence to that eigenvalue. Ericcson and Ruhe describe a method for 
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Figure 5: Errors in Transformations B and C 
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selecting shifts and deciding how many Lanczos steps to take at each shift 
[ER80]. The efficiency of the algorithm depends on how well these shifts are 
chosen, and how many Lanczos steps are taken at each shift. Normally, the 
most expensive step in the Lanczos process is the factorization of (K - aM), 
which must be done once for each shift. But if j becomes large, the calcu- 
lation of an eigenvector, Equation 21, can be very expensive. In addition, 
many steps could be required to converge to an eigenvalue if the shift is far 
away from this eigenvalue, or if the eigenvalue is poorly separated from its 
neighbors. The method used by Ericcson and Ruhe first estimates that r 
eigenvalues will converge in j steps, where r is calculated based on the fact 
that the eigenvalues of Equation 1 are linearly distributed. A cost analysis of 
the algorithm is performed, and from this analysis, a determination of how 
many steps to take at a shift is made. Their choice of shift depends on the 
inertia calculation in the factorization step [ER80]. 

In the problems from the NASA structures testbed [Ste89a], the distribu- 
tion of the eigenvalues is often anything but linear. This distribution makes 
the above estimates invalid and requires a different method for deciding how 
many steps to take at each shift. Instead of calculating the number of steps 
to take for a shift prior to execution, LANZ uses a dynamic criterion to 
decide when to stop working on a shift. Later, in Section 6, a cost analy- 
sis of the Lanczos algorithm shown in Figure 2 is given. This cost analysis 
is part of the basis for the dynamic shifting algorithm. The implementa- 
tion of LANZ, however, uses execution timings, rather than a precalculated 
cost analysis, because the cost analysis is different for each architecture on 
which the implementation is run. These timings let LANZ know how long 
each Lanczos step takes and the cost of factorization at a new shift. In 
addition to the timing information, the estimated number of steps required 
for unconverged eigenvalues to converge is calculated. The step estimate 
is computed by tracking the eigenvalues of Tj (and the corresponding error 
bounds) throughout the execution of the Lanczos algorithm. The method for 
this tracking and computation of eigenvalues is described in Section 7. With 
estimates for the number of steps required for eigenvalues to converge and 
the time needed for a step (or new factorization) to execute, the decision to 
keep working on a shift or choose a new shift can be made efficiently. 

The selection of a new shift depends on the information generated dur- 
ing the execution of LANZ on previous shifts and on inertia calculations at 
previous shifts. The inertia calculations are used to identify any eigenvalues 
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that were skipped over in previous steps, including eigenvalues of multiplicity 
greater than one. The estimated eigenvalues and their error bounds gener- 
ated during the execution of Lanczos enable the selection of a new shift based 
on these estimates (if no eigenvalues were skipped in the previous run). Be- 
cause convergence to an eigenvalue is faster if the shift, cr, is chosen to be 
close to that eigenvalue, LANZ seeks to choose a shift that is near to the 
desired unconverged eigenvalue. However, cr must not be chosen so close to 
an eigenvalue that the system ( I< - <jM)x = y becomes very ill-conditioned. 
In the authors’ experience, Lanczos’s method generates approximations to 
all nearby eigenvalues, so that even if an eigenvalue is not converged to, an 
estimate along with an error bound for a nearby eigenvalue is generated. If 
the initial Lanczos vector is not deficient in the eigenvector corresponding 
to an eigenvalue, and if that eigenvalue is close to a, then the Kaniel-Page- 
Saad theory shows that Lanczos’s method will generate an estimate to that 
eigenvalue in a few steps [Kan66]. In practice, even if the initial Lanczos 
vector is deficient in the eigenvector, round-off error will quickly make that 
eigenvector a component of the Lanczos vectors [ER80]. 

When a new shift is chosen, the initial Lanczos vector is chosen to be 
a weighted linear combination of the Ritz vectors corresponding to the un- 
converged Ritz values of the previous shift, where the weights are chosen as 
the inverse of the error bounds of those Ritz values [PS79]. The number of 
Ritz vectors chosen is based on the number of eigenvectors still to be found 
and the number of Ritz vectors with “reasonable” error bounds. To exam- 
ine the effect of using a linear combination of Ritz vectors rather than a 
random vector as the initial vector, several structural engineering problems 
(both buckling and vibration) were solved using both methods for selecting 
an initial vector (for an explanation of the problems used, see Section 9). The 
number of steps taken by each method to get the same number of eigenvalues 
is given in Figure 6 and from this it appears that using the linear combination 
of Ritz vectors is always as good or better than choosing a random vector. 

To give the reader a clearer picture of the overall execution flow of LANZ, 
a flow chart is shown in Figure 7. 

5.4 Selective Orthogonalization 

The method used to maintain “semi-orthogonality” among the Lanczos vec- 
tors is a modification of selective orthogonalization as proposed by Parlett 


22 



Problem 

Size 

Random 

Linear Combination 

Mast 

(buckling) 

3 

^ II 
II 

11 t— * 

Ox CO 
00 00 
o 

19 steps 

19 steps 

Mast 

(vibration) 
(diagonal M) 

n = 1980 
13 = 58 

10 steps 

10 steps 

Mast 

(vibration) 
(non-diagonal M) 

n = 1980 
(3 = 58 

! 

11 steps 

8 steps 

Cylinder 

(buckling) 

n = 7644 
(3 = 385 

3 steps 

2 steps 


Figure 6: Methods for choosing initial vector 

and Scott [PS79]. As described in Section 4, the Lanczos vectors do not 
lose orthogonality until a Ritz pair converges to an eigenpair of the system. 
At this point, significant components of the Ritz vector that have converged 
begin to creep into subsequent Lanczos vectors. Selective orthogonalization 
monitors the level of orthogonality between converged Ritz vectors and Lanc- 
zos vectors. If at step j, the orthogonality level between a converged Ritz 
vector and the Lanczos vector q 3 is above a given threshold (Parlett and Scott 
suggest that e 1 / 2 be used), the Ritz vector is purged from both q j and 

Parlett and Scott use the following derivation to monitor the level of 
orthogonality [PS79]. In finite precision arithmetic the Lanczos recurrence 

relation is 

&+i9j+i = B( h ~ Q j<U ~ PjVS-i + fi ( 49 ) 

where B is one of the operators described in Subsection 5.1 and /,• represents 
the roundoff error. The bound on || /, || is cc || B || where c is a constant 
independent of j derived from B . If each side of Equation 49 is multiplied 
by where y is a Ritz vector computed from Equation 21, then 

i/ r &+i9i+i = y TB< a - y T a i<n - y T P}<B-i + y T /*- ( 50 ) 

Multiply each side of Equation 23 by s to yield 

By = 0y + r ( 51 ) 
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Figure 7: Execution flow of LANZ 
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where r is not rj and will be discussed below. If the bound for || /,- || and 
Equation 51 are substituted into Equation 50, Equation 52 results. 

&+iy T ?j+i = {0y T + r T - ocjy T )qj - Pjy T qj _i + cc || B || (52) 

Parlett and Scott assume that r = qk+iPki for some k < j, and therefore, 

that 

I r Q.j Pki | Qk+lQ} I • (53) 

They then state: 1) | qj +1 qj |< e 1 ^ 2 , because semi-orthogonality among the 
Lanczos vectors is maintained, and 2) if y is a converged Ritz vector, then 
/ Qji is less than or equal to e 1 / 2 || B ||. Fact 2 is caused by the definition 
of a “good” Lanczos vector given in Section 4. If facts 1 and 2, along with 
Tj =| y T qj |, are substituted into Equation 52, then Equation 54 is derived. 

r i+i < (I 9 - a, | tj + + e\\B\\ -fee || B || )//?;+, (54) 

Because c and || B || are small and not readily available, Parlett and Scott 
ignore these terms and derive the following recurrence relation for r J+1 : 

Tj+l < (| 0 - | Tj -f PjTj-J/fij+L (55) 

The values To and tj are initialized to e and whenever y is purged from qj, 
Tj is reset to t. From this recurrence relation, the conclusion can be drawn 
that the Lanczos vectors should be orthogonalized in pairs. 

This recurrence relation predicts the actual level of orthogonality very well 
in practice with two exceptions. The first problem occurs when calculating 
Tj + 1 after y has been computed at step j — 1 and purged from qj-\ and q 3 . In 
this situation, a small increase in t j+1 over r, and Tj_ x is expected. However, 
a large increase occurs. This increase is caused by the assumption on Parlett 
and Scott’s part that | 1^ fl ^ 2 > when in fact that equation only holds 

when k < j — 1. The quantity, | qj +1 qj |, is 1 when k = j — 1. Thus, 
Equation 55 holds when k < j — 1 , but when k = j — 1 Equation 55 becomes: 

T J+ 1 ^ (I 9 - Oj I Tj + PjTj-i + pj- \ t i)!Pj+x. (56) 

The second problem arises when using Equation 34 to compute y. In this 
case r = 6y -f Pj_ijqj -f Pj^jBqjjQ assuming y is computed at step j — 1; 
therefore, 

I r T qj \< OTj -f pj.i'i + pj_ u aj/6. (57) 
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The recurrence relation used for t j+ i then becomes: 

Tj+ X < (I 0 — Ctj | Tj + PjTj- 1 + Pj-l ,« + Pj+ 1* (®®) 

If y has been computed at step j — 2, then 

| r T qj |< 0Tj + + f3j-2,iPj/0, (59) 

and, because the second term is very small, the recurrence relation for r J+1 
becomes 

Tj+i < (| 6 — oy | Tj + PjTj-i + Pj-2,iPj/Q)/ Pj+i • (60) 

5.5 Orthogonalization Methods 

Selective orthogonalization and partial reorthogonalization are the two best 
known orthogonalization methods other than full reorthogonalization. Par- 
tial reorthogonalization monitors the orthogonality of q, and q j+ 1 versus the 
other Lanczos vectors. Partial orthogonalization measures the orthogonality 
between qj and qk , via the recurrence relation defined in the following 
set of equations [Sim84]: 

u k k = 1 fork = l,...,j, ( 61 ) 

W k k-1 — Qk9k-1 / or ^ == ( 62 ) 

p j+r u} j+lk = pj+ iWj'fc +1 + (or* - otj)ujj k + pkUjk-i - PjWjk -1 + qjfk - 9k fi (63) 
and u>jfc + i = a>fc+i j for 1 < k < j. (64) 

Simon has stated that the theoretical relationship between partial reorthogo- 
nalization and selective orthogonalization is not known [Sim84]. The follow- 
ing discussion explains the relationship between the two methods. If j/; on 
the left side of Equation 50 is expanded to sjQj, and the resulting equation 
is divided by /3 J+ i , the right side becomes T j+hiy yielding 

sjQjqj+i = Tj+i'i (65) 

From the recurrence relation for partial reorthogonalization, the product 
Qjqj+ 1 is the vector Uj+i,k, where k runs from 1 to j. Thus the relation- 
ship between the o>’s and the t's is governed by 

s T uj j+ i,k = Tj+i.fc, where k = 1, ...» j. ( 66 ) 
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This relationship in Equation 66 has also been observed in numerical exper- 
iments run by the authors. When a Ritz vector, y,, is purged from qj+i and 
therefore Tj +li < becomes small, the Wj+i.k’s for which the values of s,-,* are the 
largest decrease significantly. 
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6 Execution-Time Cost Analysis 

An analysis of the execution-time cost of the Lanczos algorithm when using 
transformation B is given in Appendix A. Because the costs for the other 
transformations are almost identical their cost will not be analyzed in this 
section. The analysis has two purposes: 1) to allow the computation of the 
tradeoff point between re-starting the Lanczos algorithm at a different shift 
and continuing with the current shift, and, 2) to allow analysis of performance 
on parallel and vector /parallel machines. Throughout the analysis, only the 
cost of floating point operations is included. The assumption is made for 
this analysis that the matrices have a band structure and that the Bunch- 
Kaufman method (or an LDL T decomposition) is used for the solution of 
the linear systems. In order to simplify the analysis, the assumption is made 
that the bandwidth of K is greater than or equal to the bandwidth of M. This 
assumption has no effect on the analysis other than to avoid making some of 
the operation costs conditional on which matrix has the larger bandwidth. 
Much of this analysis does not take into account “end conditions,” such as 
those that arise near the end of a factorization when less work needs to be 
done than in the middle of a factorization. Thus, some of the expressions are 
necessarily approximations. 

Several observations regarding the shift tradeoff can be made from the 
cost analysis: 1) the single most expensive step in the algorithm is the fac- 
torization phase (2B) which is 0(np \ ), 2) the cost of the reorthogonalization 
phase increases as j increases because of the increasing number of “good” 
Ritz vectors to orthogonalize against, 3) the cost of computing a converged 
Ritz vector is based on j 2 and therefore increases rapidly as j increases, and 
4) the cost of the rest of the operations in the program loop is not affected by 
growth in j (with exception of step 15 but this step is not costly enough to 
consider). To illustrate how the costs of the four operation groups per Lanc- 
zos step change, the number of floating point operations per step is plotted 
against j , the number of Lanczos steps, in Figure 8. The costs in the Fig- 
ure 8 are from an actual LANZ run during which a new shift was selected 
beginning at step 22. These costs, of course, will differ for each problem. 
From the cost analysis and this graph it can be seen that a tradeoff exists 
between the benefits of a taking a new shift (smaller reorthogonalization and 
eigenvector computation cost as well as accelerated convergence to desired 
eigenvalues) and the benefit of continuing work on the current shift (avoiding 
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Figure 8: Operation costs plotted against Lanczos step number 
the cost of refactorization). 

The LANZ implementation uses actual timings of the various steps dur- 
ing the current run to analyze the tradeoff, rather than substituting values 
for the cost of various operations for the machine being used. The use of 
timings is simpler to implement and makes the code more portable. 
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T Solution of the system Ts = Os 

The size, j, of the tridiagonal system, Ts = Os, generated by the Lanczos 
algorithm is 1 at step 1 and increases by 1 at each Lanczos step. The size 
of T is usually very small compared to the size of the original problem, n. 
Therefore, the time used to solve the tridiagonal system does not greatly 
affect the sequential execution time of the LANZ algorithm. However, if the 
rest of the algorithm is parallelized, the solution of the tridiagonal system 
could well become a large factor in the parallel execution time. Parlett 
and Nour-Omid have proposed a method of tracking a small group of the 
eigenvalues of the T matrices as they are produced by the Lanczos algorithm. 
An inexpensive by-product of their method is the error bounds for the 6 ,• s 
[PN085]. Their algorithm monitors the outermost O' s whose error bounds, 
jSji, indicate that they will converge in the next 2 or 3 steps; it actually 
monitors 8 eigenvalues at a time. There are two phases: 1) the previous 
0’s and their error bounds are updated and any new 0’s are detected, and 
2) converged 0’s are detected and removed from the data structure. This 
algorithm is not suitable for use by LANZ for 3 reasons: 1) it is not easily 
parallelizable, 2) it does not track an eigenvalue for many steps to get a 
convergence rate estimate, and 3) in tests run by the authors, it often failed. 

The authors have developed a new solution method that is: 1) inherently 
parallel, 2) tracks all the eigenvalues of T from step to step, and 3) has been 
used successfully with LANZ to solve real structures problems. The method 
uses information from step j — 1 to solve for all the eigenvalues and their error 
bounds at step j. It uses Cauchy’s interlace theorem, shown in Equation 67, 
to find ranges for the all the eigenvalues (except the outermost eigenvalues) 
of Tj from the eigenvalues of Tj-\. 

0\ +l < 0{ < 0{ +1 <0{... 0j +1 <0j < 0# J (67) 

Cauchy’s interlace theorem states that the eigenvalues of Tj interlace those 
of Tj+\ [Par 80], In addition to the interlace property, the error bounds, 
from the previous step can be used to provide even smaller ranges for some 
eigenvalues. If good error bounds, are not available for the outer eigenval- 

ues (the interlace property only gives a starting point for these eigenvalues), 
they can be found by extending an interval from the previous extreme eigen- 
value. However, a property of the Lanczos algorithm is that the extreme 
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eigenvalues are usually the first to stabilize. The algorithm for the method 
just described is given in Figure 9. For simplicity, the algorithm does not 
show the code for handling extreme eigenvalues. The algorithm requires two 
subroutines, the root finder described below and a function, numless, that 
uses spectrum slicing to determine the number of eigenvalues of Tj less than 
a value. Details of how to efficiently implement these subroutines are given 
by Parlett and Nour-Omid [PN085]. 

A root finding method, such as bisection or Newton’s method, can be 
used to find the eigenvalues in the ranges given by the algorithm in Figure 9 
[PN085]. Newton’s method is preferred for its fast convergence and because 
it generates the j th element in s, as a by-product, which allows for the inex- 
pensive computation of the error bound for 0, [PN085]. For safety’s sake, the 
Newton root finder is protected by a bisection root finder to ensure that New- 
ton’s method converges to the desired root. If the Ritz vector corresponding 
to a particular eigenvalue, 0,-, needs to be computed, inverse iteration can be 
used to compute s,-. Because the calculation of every eigenvalue is indepen- 
dent, this algorithm is inherently parallel. In order to save time, it may be 
beneficial to keep track of which eigenvalues of T have stabilized, as these do 
not need to be recomputed. The major difficulty in parallelizing this algo- 
rithm appears to be load balancing; it will take different numbers of Newton 
iterations to find each eigenvalue, and only occasionally will inverse iteration 
be necessary. 

The algorithm developed by the authors for solving the tridiagonal system 
also tracks the eigenvalues of Tj from step to step. This tracking is necessary 
for two reasons. First, selective orthogonalization requires the computation of 
the Ritz vectors corresponding to the eigenvalues of Tj that become “good” 
(as defined in Section 4) at step j. Those Ritz vectors can be used from 
step j - f 1 until the end of the Lanczos run if the eigenvalues in Tj+\ (and 
subsequent T.’s) that correspond to the eigenvalues in Tj can be identified. 
Second, the rate of convergence of a particular eigenvalue is predicted by 
tracking its convergence over several steps (the use of the convergence rate 
was described in a previous section). 
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bounded[i] = 0 , for t = 1, j 
do * = 1, j — 1 

if ((2 *fe < 0, - 0i- 1 ) and (2 *#,-,• < 9 i+i - 9,)) then 
probe = 9i + Pji 
less = numless (probe) 
if (less = t) then 
bounded [t] = t 

else /* t and * + 1 are the only values numless will return, if 
it returns something else, a grave error has occurred */ 
bounded[i + 1] = t 
endif 
endif 
enddo 
do i = 1, j 

if (bounded[t] = 0) then 
leftbound = 
rightbound = 0,- 

newtonroot(leftbound,rightbound,new0,-,new/?jj) 

else if (bound [i] = i) then 
leftbound = 0, — flji 
rightbound = 0; 

newtonroot (leftbound, rightbound, new0,-, new 

else if (bound[t] = i — 1) then 
leftbound = 0,_i 
rightbound = 0j_i + Pji-i 

newtonroot(leftbound, rightbound, new0,,new/9j f ) 

endif 

enddo 


Figure 9: Tridiagonal Eigenvalue Solver 
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8 The Solution of the system ( K — gM)x = y 

The solution to the possibly indefinite system, 

{I< - oM)x = y, (68) 

is normally the most time-consuming step in the Lanczos algorithm for trans- 
formations A, B, and C (unless there are very few non- zero elements in 
(A — aM)). Therefore, it makes sense to try to optimize this step of the 
algorithm as much as possible. Two approaches can be taken to solving 
this system: 1) the use of direct solution methods, or 2) the use of iterative 
solution methods. Because the problems under consideration can be very 
ill-conditioned, the use of iterative methods has been avoided. 

Because this paper is focused on the problem in which K and M are 
banded, the discussion in this section is limited to the banded case. In the 
vibration problem, because K is positive definite and M is semi-positive 
definite, if a < 0, then the system in Equation 68 is positive definite. In the 
buckling problem, because K is positive definite, if a = 0, then only K must 
be factored because transformation D is used. Because K and M are always 
symmetric, Choleski’s method can be used to solve these systems. Choleski’s 
method is the direct method of choice for this class of banded linear systems 
because it is stable and results in no destruction of the bandwidth [BKP76], 
Choleski’s method is used by LANZ for the vibration problem whenever 
<7 < 0 and in the buckling problem whenever a = 0. 

The system in Equation 68 can be indefinite whenever a > 0 in the 
vibration problem and may be indefinite in the buckling problem when <r 
is non-zero. When the system is indefinite, Choleski factorization will fail 
because a square root of a negative number will be taken, and the LDL T 
decomposition is not stable because the growth of elements in L cannot be 
bounded a priori [Wil65]. The methods of choice for factoring a full symmet- 
ric indefinite matrix are the Bunch-Kaufman method and Aasen’s method 
[BG76]. It was believed that both methods, however, would destroy the 
structure of a banded system and not be competitive with Gaussian elimi- 
nation with partial pivoting, which does not destroy the band structure but 
ignores symmetry [BK77]. To address this, the authors have developed a 
new method of implementing the Bunch-Kaufman algorithm which is the 
method of choice for factoring symmetric indefinite banded systems when 
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the systems have only a few negative eigenvalues [JP89]. This is exactly the 
case which arises when moving the shift in search of the lowest eigenvalues 
of Equation 1 in the vibration problem and is often the case in the buckling 
problem as well. The modified algorithm takes full advantage of the sym- 
metry of the system, unlike Gaussian elimination, and is therefore faster to 
execute and takes less storage space. LANZ uses this algorithm whenever 
the system can be indefinite. As an additional benefit, the inertia of the 
system can be obtained virtually for free [BK77]. 

Regardless of which factorization method is used, the system is only fac- 
tored once for each <t. After the factorization has taken place, each time 
the solution to Equation 68 is required, only back and forward triangular 
solutions (and a diagonal solution in the Bunch-Kaufman case) must be ex- 
ecuted. 
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Performance Analysis 

9.1 Vectorization 

From the analysis in Appendix A it appears that vectorizing LANZ would 
result in significant speedup of the solution procedure. The LANZ code 
was compiled using the Convex Vectorizing Fortran compiler [Cor87]. The 
code was executed using double precision on a Convex 220 in both vector 
and scalar modes. Seven free vibration problems and five buckling problems 
of varying sizes from the NASA Langley testbed were run. The problems 
consisted of varying sizes of three different structures. The first structure is a 
thin circular, cylindrical shell simply supported along its edges. The buckling 
eigenvalues for this structure are closely spaced and present a challenge for 
eigensolvers. The actual finite element model only needs to model a small 
rectangle of the cylinder to correctly simulate the behavior of the structure. 
A plot of the entire cylinder that shows the 15 degree rectangle of the cylinder 
that is modeled is given Figure 20 of Appendix A. The two lowest buckling 
modes for an axially-compressed cylinder are are also plotted in Figure 21 of 
Appendix A. The second structure is a composite (graphite-epoxy) blade- 
stiffened panel with a discontinuous center stiffener. The finite element model 
for this structure is shown in Figure 22 of Appendix A. The third structure is 
a model of a deployable space mast constructed at NASA Langley Research 
Center. A picture of the deployable mast along with a plot of the finite 
element model is showm in Figure 23 of Appendix A. Descriptions of the first 
two structures can be found in [Ste89a]. A description of the deployable mast 
can be found in [HWHB86]. In every problem, at least ten eigenpairs were 
found. All times in this section are given in seconds. In each problem, n will 
refer to the number of equations, and /? will refer to the semi-bandwidth of 
the K matrix. The execution times for the vibration problem with a diagonal 
mass matrix are given in Figure 10 where a speedup due to vectorization of 
up to 7.83 is shown. Speedups of up to 7.30 for the vibration problem with 
a consistent mass matrix are given in Figure 11. For the buckling problem, 
speedups of up to 7.79 can be observed in Figure 12. These speedups are 
similar to the speedups obtained by other linear algebra applications on the 
Convex 220. From these comparisons the conclusion can be drawn that 
significant speedup of the solution procedures due to vectorization can be 
achieved. 
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Figure 10: Vectorization Results for the Vibration Problem with a Diagonal 
Mass Matrix 
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Figure 1 1 : Vectorization Results for the Vibration Problem with a Consistent 
Mass Matrix 


Problem 

Size 

Vector 

(seconds) 

Scalar 

(seconds) 

Speedup 

Factor 

Mast 

n = 1980 
j3 = 58 

5.16 

14.72 

2.85 

Cylinder 

II I' 
«" S> 

0.43 

1.01 

2.35 

Cylinder 

II « 

5 55 
8 S 

5.18 

27.54 

5.32 

Cylinder 

n = 7644 
P = 385 

70.32 

510.75 

7.26 

Cylinder 

n = 12054 
(3 = 485 

150.57 

1172.39 

7.79 


Figure 12: Vectorization Results for the Buckling Problem 
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9.2 Comparison With Subspace Iteration 

The claim was made in Section 3 that Lanczos’s method is significantly faster 
than subspace iteration. The results presented in this section support this 
claim. The LANZ code was compared with the EIG2 processor from the 
NASA Langley testbed code. The EIG2 processor uses the subspace itera- 
tion method [Ste89b]. Both codes were compiled and executed as in Sub- 
section 9.1. The same problems that were solved in 9.1 were used for this 
comparison in which the the lowest ten eigenvalues were sought. Both pro- 
grams were able to find the lowest ten eigenvalues in every case, although 
EIG2 took an unusually large number of iterations (over three times the rec- 
ommended maximum) to find them in the Mast case for both the buckling 
and free vibration problems. The Mast problem has a difficult distribution 
of eigenvalues, and the LANZ code makes use of shifting to quickly find the 
eigenvalues. Both codes were directed to find the eigenvalues to a relative 
accuracy of 10 -4 . However, the subspace iteration code used an accuracy 
measure which was more lax than that used in the LANZ code. The mea- 
sure used in the subspace code, 

(AJ+ 1 - A‘)/A‘ + \ (69) 

where k is the iteration number, is only a check to determine whether an 
eigenvalue has stabilized relative to itself. In the LANZ code 


(|| Kyi - ||)M (70) 

is used to check the relative accuracy of the combination of the eigenvalue 
and the eigenvector. Therefore, the LANZ code is at a disadvantage to the 
subspace code in this comparison because the eigenpairs are computed to 
greater accuracy than in the subspace iteration code. 

When the results comparing the two codes are given, two times are re- 
ported for LANZ: the processing time required by the code and the total 
of the system and processing time required by the code. The two times are 
given because the EIG2 processor can only report its execution time as the 
total of system and processing time. For the free vibration problem with 
a diagonal mass matrix, LANZ is shown in Figure 13 to be about 7 to 14 
times faster than subspace iteration. In Figure 14, LANZ is shown to be 
about 7 to 26 times faster than subspace iteration for the vibration problem 
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Figure 13: LANZ vs. Subspace Iteration: Vibration Problem with Diagonal 
Mass Matrix 
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Figure 14: LANZ vs. Subspace Iteration: Vibration Problem with Consis- 
tent Mass Matrix 
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Figure 15: LANZ vs. Subspace Iteration: Buckling Problem 

with a consistent mass matrix. LANZ is shown to be about 6 to 21 times 
faster than subspace iteration for the buckling problem in Figure 15. 

LANZ’s advantage over subspace iteration appears to be diminishing as 
the problem sizes increase because the factorization of the matrices takes a 
larger proportion of the time as the matrix size increases. Because each code 
could use the same factorization technique, the time spent in factorization 
distorts the advantage that LANZ holds over subspace iteration. To more 
clearly illustrate the advantage of LANZ over subspace iteration, the time 
for factorizing (If - o M) was removed from the results in Figures 13, 14, 
and 15. Only the totals of system and processing time were accessible when 
computing the modified times. Although the time for triangular linear sys- 
tem solutions (the backward, forward, and diagonal linear solutions required 
at each step) is still included, the modified times will give the reader a bet- 
ter comparison of the time spent in the eigensolving routines. In Figure 16, 
LANZ now shows an advantage of up to 47.18 for the vibration problem with 
a diagonal mass matrix. For the vibration problem with a consistent mass 
matrix, a speedup of up to 31.31 can be observed in Figure 17. A speedup 
for the buckling problem of up to 23.64 is shown in Figure 18. In Figures 16 
and 17 the LANZ code used only one factorization per problem except for 
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Figure 16: Comparison without Factorization: Vibration Problem with a 
Diagonal Mass Matrix 

the mast problem where two factorizations were required for ten eigenval- 
ues to converge. In Figure 18 the LANZ code used only one factorization 
per problem to converge to ten eigenvalues except in the two large cylinder 
problems and the mast problem, where two factorization were required. 

9.3 Performance Benefits of Tracking Eigenvalues 

The value of tracking the eigenvalues will now be shown. In Section 7 an 
algorithm for tracking and computing the eigenvalues of Tj is given. The code 
was run on a Convex C-l computer for five free vibration problems from the 
NASA Langley testbed. Ten eigenvalues were sought for each problem. To 
assess the benefits of the tracking algorithm, the code was run with the 
tracking algorithm first turned on and then turned off. The M matrices in 
this experiment are diagonal; however, the benefits would be even greater 
for non-diagonal M matrices. Reductions in execution time of up to 23 
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Figure 17: Comparison without Factorization: Vibration Problem with a 
Consistent Mass Matrix 
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Figure 18: Comparison without Factorization: Buckling Problem 
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Figure 19: Execution time with and without tracking 

percent are shown in Figure 19. The data in Figure 19 are from a version of 
the program that existed prior to changes made in early 1989. The current 
version of the program will not work with the tracking algorithm turned off. 
The gain in execution time would actually be more marked if the the tracking 
algorithm could be turned off in the current version because other parts of 
the LANZ algorithm that aren’t affected by the tracking algorithm have 
been optimized. 

9.4 Multiple Eigenvalues 

Although the test problems from NASA Langley had no low eigenvalues 
of multiplicity greater than one, some of the eigenvalues in the Mast case 
were very closely clustered. However, the performance of the algorithm with 
exact multiple eigenvalues is of interest. Therefore, diagonal test matrices 
with multiple eigenvalues were constructed to test whether LANZ would 
reveal their presence. In these test cases, the correct number of copies of 
each eigenvalue were found by LANZ. 
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10 Concluding Remarks 

10.1 Conclusions 

For the large, generalized eigenvalue problem arising from two structural 
engineering applications, the vibration and buckling problems, the LANZ 
algorithm was shown to be superior to the subspace iteration method. Re- 
sults from several structural engineering problems were given to support this 
claim. LANZ is based on the Lanczos algorithm and makes use of spectral 
transformations, dynamic movement of a shift, and a modified version of 
selective reorthogonalization to quickly converge to desired eigenpairs. The 
dynamic shift-moving algorithm used by LANZ was described. The shifting- 
moving algorithm is based on a cost analysis of the Lanczos algorithm with 
spectral transformations and selective reorthogonalizations. A parallel algo- 
rithm for efficiently solving the tridiagonal matrices that arise when using 
Lanczos s method was also given. 

10.2 Future Work 

LANZ has been shown to perform well on vector machines, an important 
class of scientific computing machines. These classes show the most promise 
for solving very large problems. The next step is to shown that LANZ will 
perform weil on parallel and vector/parallel computers. An examination of 
tbe LANZ algorithm based on the analysis in Section 6 is the logical first 
step in determining a strategy for parallelizing LANZ. A possible next step 
is to use the Force programming language to parallelize the code [Jor87l. 

his language allows parallel loops to be easily expressed and can be used 
on several different shared-memory computers. The Force has been shown 
to be a good language for parallel linear algebra applications [JPV89] The 
outlined approach would most likely provide a good barometer with which 
to assess the performance of LANZ on parallel machines. 
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A Sequential and Vector Cost Analysis 

A step-by-step cost analysis for the Lanczos algorithm when using trans- 
formation B (shown in Figure 2) is given below for sequential and vector 
machines. 

Definitions: 

fix'- The semi-bandwidth of the K matrix. 

Hm'- The semi-bandwidth of the M matrix, 
n: The number of equations in the system, 
daxpy: A double precision vector operation that computes ax+y, 
where a is a scalar and x and y are vectors. 


Initialization 

1. ) Choose an initial guess, guess 

Small cost (0(cn)), but might be larger depending on the 
method used for choosing the guess. 

2. ) rp = (K — <j M)~ l M guess (Purifying rp) 

A. ) Formation of the matrix (K — crM) 

The matrix is formed from K and M and made available 
to the factorization routine. 

Sequential: fi\tn subtractions and multiplications 
Vector. 1 ^A/ft-length daxpy operation 

B. ) Factorization of ( K — erM) 

Using Bunch-Kaufman (or LDL T decomposition) as de- 
scribed in Section 8 
Sequential: n divisions 

0(nfix) multiplications 
0(nfi^/2) multiplications and additions 
Vector. n scalar divisions 

n /iK-length vector by scalar multiplications 
n^K daxpy operations of average length fix 

C. ) Forward Solve using factored matrix from B 

Sequential: 0(nfix) multiplications and divisions 
Vector. n //^-length daxpy operations 

D. ) Diagonal Solve using factored matrix from B 

Sequential: 3 n multiplications 
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3 .) 


4 -) 


2 n additions 

Vector. 3 n-length daxpy operations 

E.) Back Solve using factored matrix from B 

Sequential: 0(n^x) multiplications and divisions 
Vector. n /^'-length inner products 

Pi = M_ r 0 

An n x n banded matrix-vector multiplication 
Sequential : 0((2p# + l)n) multiplications 
0(2hku) additions 

Vector. n nx + 1-length inner products 
n p^-length daxpy operations 
& = ( r oPi ) 1/2 

An n-length inner product and a square root 
Sequential: n multiplications 
n — 1 additions 
1 square root 

Vector. 1 n-length inner product 
1 square root 


Program Loop 

5. ) For j = 1, maximum number of iterations 

6. ) Reorthogonalization phase 

Orthogonalize r,-_i and <?j_i against “good” Ritz vectors 
if necessary (see section on orthogonalization for details). 
Steps A and B are done only once and only if reorthog- 
onalization is needed. Steps C and E are done for each 
Ritz vector that is orthogonalized against Steps D 
and F are done for each Ritz vector that is orthogonalized 
against qj-\. 

A. ) <i = 

Same cost as 3 

B. ) t 2 = Mqj-i 

Same cost as 3 
c 0 7.- = yjt i 

Multiplication of an n-length vector by a scalar 
Sequential: n multiplications 
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Vector. 1 n-length vector by scalar multiplication 

D-) V’i = yjh 

Same cost as C 

E. ) Tj_i = Tj-1 - 7,y< 

Orthogonalize rj against j/,- 
Sequential : n multiplications 
n additions 

Vector. 1 n-length daxpy operation 

F. ) qj-i = qj-i ~ 1>iVi 

Same cost as 6E 
Orthogonalize r, against y,* 

7- ) <?j — r j — i / 

Division of an n-length vector by a scalar 
Sequential: n multiplications 
1 division 

Vector. 1 n-length vector by scalar multiplication 
1 division 

8- ) Pj = Pj/Pj 

Same cost as 7 

9. ) {_ K - aMYi = Pj 

Same cost as parts C, D, and E of 3 

10 . ) rj = Tj - qj-ifc 

Orthogonalize r, against qj-i 
Same cost as 6E 

11. ) Qj = rJ Pj 

Same cost as 4 except no square root is needed 

12 . ) rj = rj - gjatj 

Same cost as 10 

13. ) pj+i = Mr ] 

Same cost as 3 

14. ) gj-j-i = (rjp,-+i) 1/2 

15. ) Compu te the eigenvalue of Tj and the corresponding error bounds 

A. ) Calculate j eigenvalues via Newton’s method 

Sequential and Vector. Variable, but very small (0(j 2 )) 

B. ) Calculate j error bounds, /?•,,• 

Sequential: j multiplications 
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16. ) Compute any converged Ritz Vectors 

Cost per Ritz vector of computing j/ t - 

A. ) Compute correction factor, w { = 1 

Sequential: 3 j multiplications 
2 j additions 
j multiplications 
1 division 

Vector. 3 jdength daxpy operations 

1 j-length vector by scalar multiplication 
1 division 

B. ) Compute y { = 

Sequential : nj multiplications 
n(j — 1) additions 

Vector n j-length inner products 

17. ) Halt if enough eigenvalues have been found 

18. ) End of Loop 
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Figure 21: Lowest two buckling modes of an axially-compressed cylinder 
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Figure 22: Composite blade-stiffened panel with a discontinous stiffener 
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